Improved Prosody Module in a Text-to-Speech System
نویسندگان
چکیده
The newly-developed prosody module of our text-to-speech (TTS) system is described in the paper. We present two main works on it’s establishment and improvement. On the basis of potential factors influencing prosody parameters, inclusive of duration, pitch and intensity, the prosody model is built as groundwork of this module which is superior to the former rule-based one in generation of natural prosody. In addition, due to the current model’s flaw in prediction of the pitch contour, we further employ an technique named “Soft Template Mark-up Language“(STEM-ML) to improve the smoothness of intonation which has the crucial influence on the naturalness of synthetic speech. Results of the evaluation indicate that the new prosody model is precise enough to predict reliable prosody parameters’ values and with the STEM-ML technique, the prosody module can further yield 14.75% reduction in the root mean square (RMS) error of the predicted pitch contour.
منابع مشابه
Synthesis of Spoken Messages from Semantic Representations. Semantic-Representation-to-Speech System
A semantic-representation-to-speech system communicates orally the information given in a semantic representation. Such a system must Integrate a text generation module, a phonetic conversion module, a prosodic module and a speech synthesizer We wil l see how the syntactic information elaborated by the text generatlon module is used for both phonetic conversion and prosody, so as to produce the...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملTwo-stage prosody prediction for emotional text-to-speech synthesis
In this paper, we adopt a difference approach to prosody prediction for emotional text-to-speech synthesis, where the prosodic variations between emotional and neutral speech are decomposed into the global and local prosodic variations and predicted using a two-stage model. The global prosodic variations are modeled by the means and standard deviations of the prosodic parameters, while the loca...
متن کاملText-to-Speech Synthesis for Mandarin Chinese
A Text-To-Speech (TTS) synthesizer is a computer-based system that is able to automatically read text aloud, regardless whether the text is introduced by computer input stream or a scanned input that is submitted to an optical character recognition (OCR) engine. TTS synthesis can be used in many areas, such as telecommunication services, language education, vocal monitoring, multimedia, and as ...
متن کاملThe Prosody of Discourse Structure and Content in the Production of Persian EFL Learners
The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...
متن کامل